fix(coprocessor): host-listener, add timeout to get logs calls#1723
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds timeout protection to get_logs RPC calls to prevent the host-listener from hanging indefinitely when encountering buggy node messages. The timeout mechanism ensures that the listener can recover and retry connections rather than becoming stuck.
- Introduces a 120-second timeout constant for websocket get_logs operations
- Wraps two
provider.get_logs()calls withtokio::time::timeout - Implements appropriate error handling and retry logic for timeout scenarios
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🧪 CI InsightsHere's what we observed from your CI run for 1654b94. 🟢 All jobs passed!But CI Insights is watching 👀 |
2f583dd to
14f3a67
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
14f3a67 to
d9baaf5
Compare
4a5848c to
2fe4586
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
From a case seen on testnet where the host-listener was stuck by alloy looping on bad node message. The timeout will cut that loop and force host-listener to retry and reconnect.
2fe4586 to
1654b94
Compare
|
@claude /pr-review |
|
Claude finished @rudy-6-4's task in 4m 56s —— View job PR Review Complete ✅
Summary: This PR correctly implements timeout protection for WebSocket RPC calls in the host-listener to prevent infinite loops from buggy node messages. The implementation uses proper The review has been posted as a PR comment. See the review comment for full details. |
There was a problem hiding this comment.
[AUTOMATED]
Code Review Summary
Overall Assessment: ✅ LGTM - The implementation is correct and addresses the stated problem effectively.
What This PR Does
Adds timeout protection to WebSocket RPC calls in the host-listener service to prevent the application from hanging indefinitely when the blockchain node sends problematic messages. This is a fix for a production issue observed on testnet.
Review Highlights
Strengths:
- Correctly uses
tokio::time::timeout()to wrap all critical RPC calls - Configurable timeout via
--timeout-request-websocketCLI argument (default 15s) - Proper error handling patterns using
anyhow::bail! - Existing retry mechanisms accommodate the new timeout error paths seamlessly
- Good addition of
self.stream = Noneon failure to trigger reconnection - Tests updated with the new required argument
Implementation Quality:
- No race conditions or concurrency issues detected
- No resource leaks -
tokio::time::timeouthandles cancellation correctly - Error handling is consistent with existing patterns
- Code follows Rust conventions and should pass clippy
Minor Observations (Non-blocking)
Nit: The functions get_block_by_id() (line 556) and get_block() (lines 592-594) could benefit from a final summary error log when all retries are exhausted, for better observability. Currently, individual retry attempts are logged, but there's no clear "gave up after N attempts" message. However, callers do log these failures, so this is a minor observability gap.
Existing Review Comments Addressed
- Copilot suggested updating the help text to be more accurate about which RPC calls are covered - I see this has been addressed
- The timeout was made configurable (as suggested by @obatirou) with a 15s default
- Error logging patterns are generally consistent
Verified against guidelines: Rust 1.91.0, anyhow for errors, tokio for async, conventional commits format ✅
Merge Queue Status🚫 The pull request has left the queue (rule: This pull request spent 1 hour 3 minutes 17 seconds in the queue, including 2 minutes 1 second running CI. Required conditions to merge
ReasonThe merge conditions cannot be satisfied due to failing checks Failing checks: HintYou may have to fix your CI before adding the pull request to the queue again. |
❌ The pull request has been removed from the queue
|
Merge Queue Status🟠 Waiting for queue conditions Required conditions to enter a queue
|
|
@Mergifyio queue |
❌ The pull request has been removed from the queue
|
|
@Mergifyio queue |
❌ The pull request has been removed from the queue
|
|
@Mergifyio requeue |
✅ This pull request will be re-embarked automaticallyDetailsThe followup |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at 45525a7 |
From a case seen on testnet where the host-listener was stuck by alloy looping on bad node message. The timeout will cut that loop and force host-listener to retry and reconnect.